Robust Action Gap Increasing with Clipped Advantage Learning
نویسندگان
چکیده
Advantage Learning (AL) seeks to increase the action gap between optimal and its competitors, so as improve robustness estimation errors. However, method becomes problematic when induced by approximated value function does not agree with true action. In this paper, we present a novel method, named clipped (clipped AL), address issue. The is inspired our observation that increasing blindly for all given samples while taking their necessities into account could accumulate more errors in performance loss bound, leading slow convergence, avoid that, should adjust advantage adaptively. We show simple AL operator only enjoys fast convergence guarantee but also retains proper gaps, hence achieving good balance large convergence. feasibility effectiveness of proposed are verified empirically on several RL benchmarks promising performance.
منابع مشابه
Increasing the Action Gap: New Operators for Reinforcement Learning
This paper introduces new optimality-preserving operators on Q-functions. We first describe an operator for tabular representations, the consistent Bellman operator, which incorporates a notion of local policy consistency. We show that this local consistency leads to an increase in the action gap at each state; increasing this gap, we argue, mitigates the undesirable effects of approximation an...
متن کاملClipped Action Policy Gradient
Many continuous control tasks have bounded action spaces and clip out-of-bound actions before execution. Policy gradient methods often optimize policies as if actions were not clipped. We propose clipped action policy gradient (CAPG) as an alternative policy gradient estimator that exploits the knowledge of actions being clipped to reduce the variance in estimation. We prove that CAPG is unbias...
متن کاملPAPR Advantage of Amplitude Clipped OFDM/TDM
OFDM combined with TDM (OFDM/TDM) can be used to reduce a high peak-to-average power ratio (PAPR) of OFDM, but the PAPR reduction is not sufficient. To further reduce the PAPR, an amplitude clipping can be applied. In this letter, we investigate the effect of clipping on OFDM/TDM with and without channel coding. It is shown that amplitude clipped OFDM/TDM has an advantage over clipped OFDM with...
متن کاملAction-Gap Phenomenon in Reinforcement Learning
Many practitioners of reinforcement learning problems have observed that oftentimes the performance of the agent reaches very close to the optimal performance even though the estimated (action-)value function is still far from the optimal one. The goal of this paper is to explain and formalize this phenomenon by introducing the concept of the action-gap regularity. As a typical result, we prove...
متن کاملActive Constellation Extension for Increasing the Capacity of Clipped OFDM
The paper studies the method active constellation extension (ACE) for increasing the channel capacity of orthogonal frequency-division multiplexing (OFDM) with a nonlinear power amplifier at the transmitter. It turns out that ACE is able to optimize the capacity but at the expense of an increased out-of-band radiation. In order to solve this issue, ACE is combined with the strategy clipping and...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2022
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v36i8.20900